Prototype reduction techniques: A comparison among different approaches

نویسندگان

  • Loris Nanni
  • Alessandra Lumini
چکیده

The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/gen-erating an adequate subset of prototypes from the training set. The problem addressed in this paper concerns the comparison of methods for prototype reduction; several methods for finding a good set of prototypes are evaluated: particle swarm optimization; clustering algorithm; genetic algorithm; learning prototypes and distances. Experiments are carried out on several classification problems in order to evaluate the considered approaches in conjunction with different nearest neighbor based classifiers: 1-nearest-neighbor classifier, 5-nearest-neighbor classifier, nearest feature plane based classifier, nearest feature line based classifier. Moreover, we propose a method for creating an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. Since these prototypes are generated using a supervised optimization function, we have called our ensemble: ''supervised bagging''. The training phase consists in repeating N times the prototype generation, then the scores resulting from classifying a test pattern using each set of prototypes are combined by the ''vote rule''. The reported results show the superiority of this method with respect to the well known bagging approach for building ensembles of classifiers. Our results are obtained when 1-nearest-neighbor classifier is coupled with a ''supervised'' bagging ensemble of learning prototypes and distances. As expected, the approaches for prototype reduction proposed for 1-nearest-neighbor classifier do not work so well when other classifiers are tested. In our experiments the best method for prototype reduction when different classifiers are used is the genetic algorithm. Currently, several machine learning applications require managing extremely large data sets with the aim of data mining or classification. In many problems a general purpose classifier based on the distance from a set of prototypes, i.e. nearest neighbor (NN) classification rule, has been successfully used. The good behavior of nearest neighbor based classifiers is related to the number of prototypes, but in many practical pattern recognition applications only a small number of prototypes is usually available and, typically, this limitation causes a strong degrade of the ideal asymptotical behavior of a nearest neighbor based clas-sifiers (Bezdek & Kuncheva, 2001; Dasarathy, 1991). Unfortunately , another strong limitation exists: the computational cost of a nearest neighbor based classifier increases with the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Semantic Mapping as a Vocabulary Instruction Technique on EFL Learners with Different Perceptual Learning Styles

Traditional and modern vocabulary instruction techniques have been introduced in the past few decades to improve the learners’ performance in reading comprehension. Semantic mapping, which entails drawing learners’ attention to the interrelationships among lexical items through graphic organizers, is claimed to enhance vocabulary learning significantly. However, whether this technique suits all...

متن کامل

SURVEY OF NEW APPROACHES ON PROTOTYPE SELECTION AND GENERATION 1 Survey of New Approaches on Prototype Selection and Generation

Prototype Selection and Prototype Generation are two lively research fields. As time goes by, new methods are developed following the basic guidelines that any Prototype Reduction algorithm should accomplish. These new methods often offer to the comunity different viewpoints to the problem, as well as better and more efficient approaches. This technical report provides a survey of the newest Pr...

متن کامل

Estimation of Phosphorus Reduction from Wastewater by Artificial Neural Network, Random Forest and M5P Model Tree Approaches

This study aims to examine the ability of free floating aquatic plants to remove phosphorus and to predict the reduction of phosphorus from rice mill wastewater using soft computing techniques. A mesocosm study was conducted at the mill premises under normal conditions, and reliable results were obtained. Four aquatic plants, namely water hyacinth, water lettuce, salvinia, and duckweed were use...

متن کامل

The Comparison of Direct and Indirect Optimization Techniques in Equilibrium Analysis of Multibody Dynamic Systems

The present paper describes a set of procedures for the solution of nonlinear static-equilibrium problems in the complex multibody mechanical systems. To find the equilibrium position of the system, five optimization techniques are used to minimize the total potential energy of the system. Comparisons are made between these techniques. A computer program is developed to evaluate the equality co...

متن کامل

Analysis and Comparison of PAPR Reduction Techniques in OFDM Systems

The destructive impact of fading environments and also bandwidth limitations are two main challenges which communication is dealing with them. These challenges can affect on the growth of wireless communication and even cause reliable communications and high data rate to be prevented. Thus, OFDM (Orthogonal Frequency Division Multiplexing) modulation by using of fast calculation hardwares such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2011